TWITTER EVENT NETWORKS AND THE SUPERSTAR MODEL 



SHANKAR BHAMIDI AND J.MICHAEL STEELE AND TAUHID ZAMAN 

Abstract. Motivated by "condensation" phenomena often observed in social networks 
such as Twitter where one "superstar" vertex gains a positive fraction of the edges, while 
the remaining empirical degree distribution still exhibits a power law tail, we formulate a 
mathematically tractable model for this phenomenon which provides a better fit to em- 
pirical data than the standard preferential attachment model across an array of networks 
observed in Twitter. Using embeddings in an equivalent continuous time version of the 
process, and adapting techniques from the stable age-distribution theory of branching 
processes, we prove limit results for the proportion of edges that condense around the su- 
perstar, the degree distribution of the remaining vertices, maximal non-superstar degree 
asymptotics, and height of these random trees in the large network limit. 



1. Retweet Graphs and a mathematically tractable Model 

Our goal here is to provide a simple model that captures the most salient features of 
a natural graph that is determined by the Twitter traffic generated by public events. In 
the Twitter world (or Twitterverse) , each user has a set of followers; these are people who 
have signed- up to receive the tweets of the user. Here our focus is on retweets; these are 
tweets by a user who forwards a tweet that was received from another user. A retweet is 
sometimes accompanied with comments by the retweeter. 

Let us first start with an empirical example which contains all the characterstics ob- 
served in a wide array of such retweet networks. Data was collected during the Black 
Entertainment Television (BET) Awards of 2010. We first considered all tweets in the 
Twitterverse that were posted between 10 AM and 4 PM (GMT) on the day of the cere- 
mony, and we then restricted attention to all the tweets in the Twitterverse that contained 
the term "BET Awards." We view the posters of these tweets as the vertices of an undi- 
rected simple graph where there is an edge between vertices v and w if w retweets a tweet 
received from v, or vice- versa. We call this graph the retweet graph. 

In the retweet graph for the 2010 BET Awards one finds a single giant component (see 
Figure 1.1). There are also many small components (with five or fewer vertices) and a 
large number of isolated vertices. The giant component is also approximately a tree in 
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Figure 1.1. Giant component of the 2010 BET Awards retweet graph. 



the sense that if we remove 91 edges from the graph of 1724 vertices and 1814 edges we 
obtain an honest tree. Finally, the most compelling feature of this empirical tree is that 
it has one vertex of exceptionally large degree. This "superstar" vertex has degree 992, so 
it is connected to more than 57% of the vertices. As it happens, this "superstar" vertex 
corresponds to the pop-celebrity Lady Gaga who received an award at the ceremony. 

1.1. Superstar Model for the giant component. Our main observation is that the 
qualitative and quantitative features the giant component of the retweet graph may be 
captured rather well by a simple one-parameter model. The construction of the model 
only makes an obvious modification of the now classic preferential attachment model, but 
this modification turns out to have richer consequences than its simplicity would suggest. 
Naturally, the model has the "superstar" property baked into the cake, but a surprising 
consequence is that the distribution of the degrees of the non-superstar vertices is totally 
different from what one finds in the preferential attachment model. 

To construct the model we consider a graph evolution process that we denote by {G n , 
n = 1, 2, . . .}. The graph G\ consists of the single vertex vq, and we call vq the superstar. 
The graph G2 then consists of the superstar vq , a non-superstar v\, and an edge between the 
two vertices. For n > 2, we then construct G n +i from G n by attaching the vertex v n to the 
superstar with probability < p < 1 while with probability q = 1 —p we attach v n to a non- 
superstar according to the classical preferential attachment rule. That is, with probability 
q the non-superstar v n is attached to one of the non-superstars {v\,v^ . . . , u n -l}> and given 
that v n is attached to a non-superstar, it is attached to the vertex 1 < i < n — 1, with 
probability that is proportional to the degree of V{ in G n . 

1.2. Organization of the paper. The rest of the paper is organized as follows. In the 
next section, we state the main mathematical results for the Superstar Model. We discuss 
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previous work analyzing Twitter networks and the connection between the model analyzed 
in this paper and existing models in Section 3. In Section 4 we study the performance 
of this model on various real networks constructed from the Twitterverse and compare 
this to the standard preferential attachment model. Section 5 is the heart of the paper 
where we construct a special two type continuous time branching process which turns out 
to be equivalent to the Superstar Model and analyze various structural properties of this 
continuous time model. In Section 6 we prove the equivalence between the continuous time 
model and the Superstar Model through a surgery operation. In Section 7 we complete 
the proofs of all the main results by using the equivalence between the two models and the 
proven properties of the continuous time model to read off results for the Superstar Model. 

2. Mathematical Results for the Superstar Model 

Let {G n , n = 1,2,...} denote a graph process that follows the the Superstar Model 
with parameter < p < 1. We shall think about all the processes constructed on a single 
probability space through the obvious sequential growth mechanism so that one can make 
almost sure statements. As before, the first vertex vo is called the "superstar." and the 
remaining vertices are non-superstars. The degree of the vertex v in the graph G is denoted 
by deg(t>, G). The first result describes asymptotics of the condensation phenomena around 
the superstar. 

Theorem 2.1 (Superstar Strong Law). With probability one, we have 

lim -deg(v ,G n ) = p. (2.1) 

n— >oo n 

The next result describes the asymptotic degree distribution. 
Theorem 2.2 (Degree Distribution Strong Law). With probability one we have 

lim -card {1 < j < n : deg(v j,G n ) = k} = v S m (k,p) , 

n— >oo n 

where vsm(-,p) is the probability mass function defined by 

k 

/, x ^ - P /, 

vsm 



(fe , P )=^( fc -i) ! n(«+^)" 1 . 



Remark 2.3. This theorem implies that the degree distribution of the non-superstar vertices 
have a power law tail. Specifically, 



^fc*-i>'n(^r~<*- . 



as k — > oo for the constants a = (3 — 2p)/(l — p) and C p = (2 — p)/(l — p)T(a)e 2+a . 

The next theorem concerns the largest degree amongst all the non-superstar vertices 
{vi : 1 < i < n}. Let 

T„, := max deg(vi,G n ). 

Ki<n 
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Theorem 2.4 (Maximal non-superstar degree). Letj = (l—p)/(2 — p). There exists 
a non- degenerate strictly positive finite random variable A* such that with probability one 
we have 

lim — T n = A*. 

The almost sure linear growth of the degree of the superstar (Theorem 2.1) is to be 
expected from our construction. The scaling of the second largest degree vertex underscores 
a notable divergence from the preferential attachment model where the maximal degree 
grows at the rate 0(n 1//2 ) [20]. 

Recall that G n is a tree. We shall think of this tree as rooted at the superstar vq. Let 
T-L(G n ) denote the graph distance of the vertex furthest from the root. Call this the height 
of G n . Theorem 2.1 implies that a fraction p of the network is directly connected to the 
superstar. One immediately wonders if this reflects a general property of the network, does 
the height 7-L{G n ) = O p (l) as n — > oo? The next theorem shows that in fact the height of 
the tree increases logarithmically in the size of the network. 

Theorem 2.5 (Logarithmic height scaling). Let W(-) be the Lambert special function 
with W(l/e) ~ 0.2784. Then with probability one we have 

lim — — n(G^ ' ~ ; ' 



logn v ' W(l/e)(2-p)' 

3. Related results and questions 

The fields of social networks and attachment models have witnessed an explosive growth 
over the last few years. In this Section we briefly discuss the connections between this model 
and some of the more standard models in the literature as well as extensions of the results 
in the paper. We also discuss previous empirical research done on the structure of Twitter 
networks. 

(a) Preferential attachment: This has become one of the standard workhorses in the 
complex networks community. It is almost impossible to provide even a partial list of 
references but see [7] for bringing this model to the attention of the networks community, 
[22], [13] for survey level treatments of a wide array of models, [9] for the first rigorous results 
on the asymptotic degree distribution, and [11], [8], [26], and [14] and the references therein 
for more general models and results. Restricting ourselves to the simplest case, one starts 
with two vertices connected by a single edge as in the Superstar Model and then each new 
vertex joins the system by connecting to a single vertex in the current tree by choosing 
this vertex with probability proportional to its degree. In this case, one can show ([9]) that 
there exists a limiting asymptotic degree distribution such that with probability one 

1 4 

lim -card{l < j < n : deg(vj,G n ) = k} = , 
n-Kx> n k(k + l){k + 2) 

thus exhibiting a degree exponent of three. The Superstar Model changes the degree ex- 
ponent of the non-superstar vertices from three to (3 — 2p)/(l — p) (see Theorem 2.4). 
Further, for the preferential attachment model the maximal degree scales like n 1 / 2 ([20]), 
while for the Superstar Model, the maximal non-superstar degree scales like n 7 with 
7 = (l-p)/(2-p). 

(b) Statistical estimation: We use real data on various Twitter streams to analyze the 
empirical performance of the Superstar Model and compare this with typical preferential 



5 



attachment models in Section 4. Estimating the parameters from the data raises a host 
of new interesting statistical questions. See [27] where such questions were first raised and 
likelihood based schemes were proposed in the context of usual preferential attachment 
models. Considering how often such models are used to draw quantitative conclusions 
about real networks, proving consistency of such procedures as well as developing method- 
ology to compare different estimators in the context of models of evolving networks would 
be of great interest to a number of different fields. 

(c) Stable age distribution: The proofs for the degree distribution build heavily on the 
analysis of the stable age distribution for a single type continuous time branching process 
in [21]. We extend this analysis to the context of a two type variant whose evolution 
mirrors the discrete type model. Using Perron-Frobenius theory a wide array of structural 
properties are known about such models (see [17]). The models used in our proof tech- 
nique are relatively simpler and we can give complete proofs using special properties of the 
continuous time embeddings, including special martingales which play an integral role in 
the treatment (see e.g. Proposition 5.3). There have been a number of recent studies on 
various preferential attachment models using continuous time branching processes, see e.g. 
[25, 5, 12]. For the usual preferential attachment model (p = 0), [24] using embeddings in 
continuous time and results on the first birth time in such branching processes [18] shows 
that the height satisfies 

logn * 2W(l/e) 

We use a similar technique but we first need to extend [18] to the multi-type setting, of 
relevance to us. 

(d) Previous analysis of Twitter networks: The majority of work analyzing Twitter 
networks has been empirical in nature. In one of the earliest studies of Twitter networks 
[19] the authors looked at the degree distribution of the different networks in Twitter, 
including retweet networks associated with individual topics. Power-laws were observed, 
but no model was proposed to describe the network evolution. In [4] the link between 
maximum degree and the range of time for which a topic was popular or "trending" was 
investigated. Correlations between the degree in retweet graphs and the Twitter follower 
graph for different users was studied in [10]. These empirical analyses provided many 
important insights into the structure of networks in Twitter. However, the lack of a model 
to describe the evolution of these networks is one of the important unanswered questions 
in this field, and the rigorous analysis of such a model has not even been considered yet. 
Our work here presents one of the first such models which produces predictions that match 
Twitter data and also is given a rigorous theoretical analysis. 

4. Retweet Graphs for Different Public Events 

We collected tweets from the Twitter firehose for thirteen different public events, such 
as sports matches and musical performances [1]. The Twitter firehose is the full feed 
of all public tweets which is accessed via Twitter's Streaming Application Programming 
Interface [2]. By using the Twitter firehose, we were able to access all public tweets in the 
Twitterverse. 

For each public event E 6 {1,2,..., 13}, we kept only tweets which have an event specific 
term and used those tweets to construct the retweet graph which we denote Ge- Our 
analysis focuses on the giant component of the retweet graph, which we denote G E . In 
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Table 4.1. For each event E, we list the number of vertices ( |V(G^)|), 
number of edges (\E(G° E )\) , and maximal degree (d max (G E )) in the giant 
component G E , along with the Twitter name of the superstar corresponding 
to the maximal degree. 



Table 4.1 we present important properties of each retweet graph's giant component such 
as the number of vertices, number of edges, maximal degree, and the Twitter name of the 
superstar corresponding to the maximal degree. A more detailed description of each event, 
including the event specific term, can be found in the Appendix. 

The sizes of the giant components range from 239 to 7365 vertices. The giant components 
are not trees, but are very tree-like. As can be seen in the table, for each giant component, 
the deletion of a small number of edges will result in an honest tree. 

4.1. Maximal degree. The maximal degree in the retweet graphs is larger than would 
be expected under preferential attachment. Let us call the number of vertices in the giant 
component n = \V(G E )\. For a preferential attachment graph with n vertices it is known 
that the maximal degree scales as n 1 / 2 . Figure 4.1 shows a plot of the maximal degree in 
the giant component d max (G E ) and a plot of n 1 / 2 versus n for the retweet graphs. It can 
be seen from the figure that the sublinear growth predicted by preferential attachment is 
not capturing the superstar effect in these retweet graphs. 

4.2. Estimating p and the degree distribution. The Superstar Model degree distri- 
bution is known once the superstar parameter p is specified. We are interested in seeing if 
for each event E this model can predict the degree distribution in G E . For an event E and 
degree k G {1,2,...} we define the empirical degree distribution of the giant component as 

dE{k) = ]v^j\ card ^ G v{g ° e) : deg ^' G °^ = k ) 

To predict the degree distribution using the Superstar Model, we need a value for p. We 
estimate p for each event E as p(E) = d max (G E )/\V(G E )\. Using p = p{E) we obtain the 
Superstar Model degree distribution prediction for each event E and degree k, vsM{k-,p) 
from Theorem 2.2. For comparison, we also compare VE{k) to the preferential attachment 
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Figure 4.1. Plot of d ma x(G E ) versus n = \V(G%)\ for the retweet graphs 
of each event. The events are labeled with the same numbers as in Table 
4.1. Also shown is a plot of n 1 / 2 . 



degree distribution vpA{k) = 4 (k(k + 1)(A; + 2))~ [9]. Figure 4.2 shows the empirical 
degree distribution for the retweet graphs of four of the events, along with the predictions 
for the two models. As can be seen, the Superstar Model predictions seem to qualitatively 
match the empirical degree distribution better than preferential attachment. To obtain a 
more quantitative comparison of the degree distribution we calculate the relative error of 
these models for each value of degree k. The relative error for event E and degree k is 
defined as relerrorgM (k, E) = \vsM{k-,p) — VE{k)\ ( z/ Sm(^ ; p))~ 1 for the Superstar Model 
and relerrorp^/c, E) = \vpA{k) — VE{k)\ {vpA{k))~ l for preferential attachment. In Figure 
4.3 we show the relative errors for different values of k. As can be seen, the relative error 
of the Superstar Model is lower than preferential attachment for degrees k = 1,2,3,4 and 
for all of the events with the exception of k = 4 and E = 7. There is a clear connection 
between the superstar degree and the degree distribution in the giant component of these 
retweet graphs which is captured well by the Superstar Model. 



5. Analysis of a special two type branching process 

The proofs of the theorems of Section 2 exploit a special two-type continuous time 
branching processes together with a simple surgery that proves the equivalence between 
this construction and the superstar model. We start by describing this construction and 
proving the equivalence between the two models. We shall then derive various properties 
(degree distribution, height and maximal degree) of the continuous time version and show 
how these results carry over to the Superstar Model. 

5.1. A two type continuous branching process. We now consider a two-type contin- 
uous time branching process BP(i) whose types we call red and blue. We use |BP(i)| for 
the total number of individuals in the population by time t. In the construction, every 
individual survives forever so there is no distinction between living and dead individuals. 
We shall also let {BP(i)} t>0 be the associated filtration of the process. At time t = we 
begin with a single red vertex which we call v\. For any fixed time < t < oo, let Vt 
denote the vertex set of BP(t). Each vertex v G Vt in the branching process tree gives birth 
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Figure 4.2. Plots of the empirical degree distribution for the giant com- 
ponent of the retweet graphs (u^(k)), and the estimates of the Superstar 
Model (ysM(k,p(E))) and preferential attachment (vpA(k)) for four differ- 
ent events. Each plot is labeled with the event specific term and p{E). 



according to a Poisson process with rate 

\{v,t) = c B (v,t) + l 

where cb(u, t) is equal to the number of blue children of vertex v at time t. Also let cr(v, t) 
denote the number of red children of vertex by time t. At the moment of a new birth, 
the new child vertex is colored red with probability p and colored blue with probability 
q = 1 — p. There are no deaths of vertices, and all vertices continue to procreate through 
all time. For t > 0, write R(t) and B{t) for the total number of red and blue vertices 
respectively in BP(i). Finally for n > 1, define the stopping times 

r n = inf {t : |BP(t)| = n} . (5.1) 

Since the counting process |BP(i)| is a non-homogenous Poisson process with a rate that 
is always greater than or equal to one, we see that for any n > 1, the stopping times r n 
are almost surely finite. 

5.2. Elementary properties of the branching process. By construction of the pro- 
cess, every new vertex is independently colored red with probability p and blue with prob- 
ability 1 — p. In particular the number of blue vertices B(t) is just the time changes of a 
random walk with Bernoulli(l — p) increments. Thus by the strong law of large numbers, 
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we have 

m := ^ 1 ~ p ' as ^°°- ( 5 - 2 ) 

Before moving onto an analysis of the branching process, we introduce the Yule process. 

Definition 5.1 (Rate a Yule process). Fix a > 0. A rate a Yule process is defined as 
a pure birth process Yu a (-) which starts with a single individual Yu a (0) = 1 with the 
rate of creating a new individual proportional to the number of present individuals in the 
population with 

P(Yu a (t + dt) - Yu a (t) = l|Yu (t)) = aYu a (t)dt. 

The Yule process is well studied and the next Lemma collects some of its standard 
properties (see [23], Section 2.5). 

Lemma 5.2 (Yule process). 

(a) For any t > 0, Yu a (i) has a geometric distribution with 

F(Yu a = k) = e- at (l-e- at ) k - 1 , k > 1. 



(b) The process e at Yu a (t) is an L 2 bounded martingale with respect to the natural filtration 
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of the process. Thus e at Yu a (t) — ^> W , where W has an exponential distribution with 
mean one. 

Now define the process 

M{t) = e -( 2 -P)* (|BP(t)| + B{t)) < t < oo. 

Proposition 5.3 (Asymptotics for BP(i)). The process {M(t)} t>0 is a positive L 2 bounded 
martingale with respect to the natural filtration {BP(i)} t>0 and thus converges to a random 
variable M(t) — > W* almost surely and in L 2 with K(W*) = 1. The random variable 
W* > with probability one. By (5.2) 

W* 

lim e" (2_p) *|BP(t)| = := W with probability one. (5.3) 

t— >oo 2 — p 

Proof Write Z(t) = |BP(t)| and Y(t) = Z(t) + B(t) so that M{t) = e -( 2 -f)*Y \t) . We 
shall denote dM(t) = M(t + dt) - M(t). Then 

dM(t) = e- {2 - p ^dY{t) - (2 - p)e- (2 ^ t Y{t)dt. (5.4) 

Note that the processes Z(t),B(t) are all counting process which increase by increments 
of one. For such processes, we shall use the infinitesimal notation K(dZ(t)\BP(t)) = a(s)ds 
to denote the fact that Z(t) — a(s)ds is a local martingale. 

Now the counting process Z(t) = |BP(t)| evolves by jumps of size one with 

F(dZ(t) = l\BP{t)) = ( (c B (v,t) + l) \ dt 

where CB(v,t) denotes the number of blue children of vertex v at time t. The number 
of blue vertices can be written as B(t) = Y^veFtt) sm ce every blue vertex is an 

offspring of a unique vertex in BP(i) and is counted exactly once in this sum. Thus using 
the rate description, we get the conditional expectation 

E(dZ(t)\BP(t)) = (Z(t) + B(t))dt. 

Since B(t) < Z(t), we see that the rate of producing new individuals is bounded by 2|BP(i)|. 
Thus the process |BP(t)| can be stochastically bounded by a Yule process with a = 2. This 
implies by Lemma 5.2 that for all t > 0, E(|BP(t)| 2 ) < oo. 

Let us now analyze the process B(t). This process increases by one when the new vertex 
born into BP(-) is colored blue which happens with probability 1 — p. Thus we get 

E(dB(t)\BP(t)) = (l-p)(Z{t)+B(t))dt. 

Combining we get 

E(dY(t)\BP(t)) = (2-p)Y(t)dt. 

Now using (5.4) gives K(dM (t)\BP (t)) = which completes the proof that M(-) is a 
martingale. 

Let us next show that M(-) is an L 2 bounded martingale. The process Y 2 (t + dt) can 
take values (Y(t) + l) 2 or (Y(t) + 2) 2 at rate pY(t) and (1 —p)Y(t) respectively. Thus we 
get 

E(dM 2 {t)\BP(t)) = (4 - 3 P )e- {2 - p)t M(t)dt. 
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In particular the process U(t) defined as 

U{t) = M 2 (t) - (4 - 3p) / e^ 2 - p >M(s)ds 

Jo 

is a martingale. Taking expectations and noting that since M(-) is a martingale, this 
implies that E(M(s)) = 1 for all s gives 

E(M 2 (t)) = 1 + (4 - 3p) f e-( 2 - p)s ds < 1 + 

Jo 2-p 

This shows L 2 boundedness and immediately implies that there exists a random variable 
W* such that 

e -( 2 -p)*(|BP(t)| + fl(t)) VT. 

Using equation (5.2) shows that e~( 2_p ^|BP(t)| -> PF*/(2 -p) := TV. Now we only need 
to show is strictly positive. First note that by L 2 convergence, E(T^*) = 1. This shows 
that ¥(W = 0) = r < 1. Let Ci < C2 < • • • be the times of birth of children (blue or red) 
of the root vertex v\ and write BPj(-) for the subtree consisting of the i th child and its 
descendants. Then 

00 

e -(2-p)*| B p(t)| = ^ e -(2-P)Ci 

Thus as t — > 00 we have the distributional identity W = Y^jLi e~^~ p '^*Wi where {Wj} i>:L 
are independent and identically distributed with the same distribution as W (independent 
01 {Ci}j>i)- Thus 

F(W = 0) = F(Wi = V i > 1) = 0. 

Before ending this Section, we derive some elementary properties of the offspring of 
an individual in BP(-). Let a v be the time of birth of vertex v in BP(-). Recall that 
cb(v, a v + s) and cr(v, a v + s) denote the number of blue and red children respectively of 
this vertex s units of time after the birth of v. Also define the process 

M*(t):=c R (v,t + a v )- f p(c B (v, a v + s) + l)ds, t>0. 

Jo 

Lemma 5.4 (Offspring distribution properties). 

(a) Conditional on BP((j„), the process cb(v,o~ v -\--) has the same distribution as Yui_ p (-) — 
1. In particular K(cb(v ,t)) = e^ 1- ^' — 1. 

(b) The process M*(t) is a martingale with respect to the filtration {BP(a v + s) : s > 0}. 
In particular E(c R (v, a v + t)) = ^(e^ 1 "^* - 1). 

Proof. Part(a) is obvious from construction. To prove (b), note that ~K{dcn(v, a v +t)\ BP(t+ 
o~v)) =p(cb(v,ct v + t) + l)dt. □ 

5.3. Convergence for blue children proportions. The equivalence between BP( ) and 
the superstar model will imply that the number of vertices with degree k + 1 in G n +i is 
the same as the number of vertices in BP(r n ) with exactly k blue children. We will need 
general results on the asymptotics of such counts for the process BP(i) as t — > 00. 



e -(2-p)( t -c,)| B P i (t - 0)1 l{d<t} + e 



-{2-p)t 
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Theorem 5.5. Fix k > 1 and /et Z>k(t) denote the number of vertices in BP(i) which 
have at least k blue children. Then 

e-^Z >k (t)^ P>k (oo)^- 

2 — p 

as t — > oo, where W* is the martingale limit obtained in Proposition 5.3 and p>k(oo) is 
defined by 

p>*(oo)=*!n(; + ^) \ 

Proof: The proof uses a variant of the "reproduction martingale" technique developed 
in [21]. The proof relies on two steps: 

(a) Proving convergence of expectations of the desired quantities: Section 5.3.1. 

(b) Bootstrapping this to a.s. convergence using law of large numbers: Section 5.3.2. 
We setup some initial notation to carry out this program. Write £ = (Cl>C2> • • • > ) for 

the offspring birth times of the root vertex v% (the offspring distribution of any vertex in 
BP(-) is independent with the same distribution). For t > 0, let £[0,i] denote the number 
of offspring in the interval [0, t] and let //[0, t] = E(£[0, t]) be the corresponding intensity 
measure. We start with a simple Lemma which will have profound consequences. 

Lemma 5.6 (Renewal measure). Define a = 2 — p. Then 

poo 

/ e- at fj,(dt) = 1. 
Jo 

Thus the measure defined as p a := e~ at fj>(dt) is a probability measure. Further this measure 
has expectation J Q tfi a {dt) = 1. 

Proof: Recall that in Lemma 5.4 we used cb(v\, t), cr(vi, t) to denote the number of 
red and blue children respectively of vertex v%. Then /i([0,i]) = E(cr(ui, t) + cb(ui,£)). 
Further by Fubini's theorem 

/■OO /'OO 

/ e~ at Li(dt) = a e~ at fi[0,t]dt. 
Jo Jo 

Using the expressions for E(cb(vi, t)), E(cr(^i, t)) from Lemma 5.4 completes the proof. 
The second assertion regarding the expectation follows similarly. □ 

5.3.1. Convergence of expectations. The first step in the proof of Theorem 5.5 is conver- 
gence of expectations. This follows using standard renewal theory. However we will first 
need to setup notation that will allow us to use the linearity of expectations to derive a 
renewal equation. 

Let us motivate an abstract definition of a characteristic. Fix some time t > 0. Suppose 
we are interested in the number of vertices with at least k blue children at this time. For any 
vertex v £ BP(-), write a v for the time of birth of the vertex into BP(-). Then conditional 
on BP(a v ), the distribution of the number of blue children of vertex v by time t — a v is 
Yu^_ p (t — a v ) — 1, where we construct a countable family of independent rate 1 — p Yule 
processes Yu^_ p (-) and use these to construct BP(-) along with additional randomization 
for the red vertices. In particular, writing Z>k(t) for the number of vertices with degree 
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at least k, this can be expressed as 

Z>k(t)= l{[Yu?_ p (t-ff„)-l]>A;}. 
ueBP(t) 

This motivates the following abstract construction. Let <f> : M + — > M + be a bounded 
(sup t <j)(t) < C for some non-random constant C) non-negative measurable stochastic pro- 
cess which depends only on the offspring distribution of a single vertex, often referred to 
as a characteristic, see e.g. [16]. Let 4> v (-) be copies of this characteristic for each vertex 
v £ BP. Finally define 

Z4.it) = 4> v {t-a v ), t>0 

v&BP(t) 

for the branching process BP(-) counted according to characteristic </>. The main examples 
of interest are 

(a) Total size: <p(t) = 1 gives Z$(t) = \BP(t)\. 

(b) Degree: <p(t) = \{k or more blue children at time i] gives Z^(t) = Z>k(t)- 

Fix any time t > 0. Conditioning on the offspring distribution of v±, both of these 
characteristics satisfy the recursion 

Z (t) = ^(t) + £z»(t-Ci), (5.5) 

where Z { ^\-~) = Z^(-) and are independent. Taking expectations and writing m^{t) = 
E(^(t)), these functions satisfy the renewal equation 

m^t) = E(0(t)) + f m^t - s)n{ds) 
Jo 

Lemma 5.6 and renewal theory ([15]) now imply the next result. 
Proposition 5.7. For bounded characteristics, writing a = (2 — p) we have 

roc 

e- at m^{t) -»• / e- as E(tj)(s))ds := m^(oo) 
Jo 

Corollary 5.8. Taking the two characteristics of interest one gets for (f>(t) = 1 

e- at E(|BP(t)|) -»• -, ast^oc 
a 

and for <j>(t) = 1 {k or more blue children at time t} 

e- at E(Z >k (t)) -)> P^hM. ast^oc. 
a 

Proof: The first assertion in the corollary is obvious. To prove the second assertion 

regarding the number of blue vertices, observe that the limit constant in Proposition 5.7 

can be written as 

1 f°° 1 

— / ae as K(l {k or more blue children at time s\)ds = —F(cb(v\,T) > k) 

a Jo a 

where T is an exponential random variable with mean a -1 that is independent of the blue 
offspring distribution cb{v\, ■) = Yui_ p (-) — 1 where Yui_ p (-) is rate 1 — p Yule process. The 
inter- arrival times Xi between blue children i and i+1 are independent exponential random 
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variables with mean (1— p) + , independent of T. In particular ¥{cb{v\,T) > k) 



P(T > Ylj=o Xj)- One can check that the last probability equals p>fc(oo). □ 

5.3.2. Almost sure convergence. The aim of this section is to strengthen the convergence 
of expectations to almost sure convergence. A key role is played by a "reproduction 
martingale", a close relative of the martingale used in [21] to analyze single type branching 
processes as well as in [18] to analyze times of first birth in generations. As before let 
v i > v 2 > ^3 > • • • denote the order in which vertices appear and let Tj = o~ Vi denote the times at 
which these vertices are born into the branching process BP(-). Let <!; w = (Cv»,l> Cu»,2j • ■ ■) 
denote the offspring point process of Viewing £ w as a random measure on R + , we get 



°° /*oo 
,=1 Jo 



(dt). 



For m > 1 let T m be the sigma-algebra generated by vertices {vi,...,v m } and their 
offspring distribution point process (i.e. for 1 < i < m, T m has the type of Vi, times of 
birth as well as types of all the offspring). Define Rq = 1 and 

Rm+l := Rm + e~ aCT ^ + l (£m+l) _ 1). 

Let T m be the set of the first m individuals born and all of their offspring. It is easy to 
check that 

m 

R m = l+ e~ aav ~ Yl ■ ( 5 - 6 ) 

In particular R m > for all m. The next Lemma shows that the sequence < R m > is 

I J m>l 

much more. 

Proposition 5.9 (Reproduction martingale). The sequence < R m > is a non-negative 

martingale with respect to the filtration { T m \ . Thus there exists a random variable 

I J m>l 

Roo with E(i?oo) = 1 such that R m — > Roo almost surely and in L 2 . 

Proof: By the choice of a = 2 — p in Lemma 5.6, E(^ ) ) = J °° e~ at fi(dt) = 1. Further 
a v m +i i s F m measurable while is independent of T m . Thus one gets 

E(R m+1 - Rm\f m ) = e- a ^+'E(e +1) - 1) = 0. 

Now assuming E([^ l) ] 2 ) < oo, we see by the orthogonal increments of the martingale i? m 
that 



E(R 2 m ) < E([^f)E 



\i=l 



Thus to check L 2 boundedness it is enough to check that the right hand side is bounded. 
The following lemma accomplishes this. 

Lemma 5.10. 

(a) Assume < p < 1. Then E([£ Q ] 2 ) < oo. 

(b) For any m, E(J^Ll e" 2aCT "0 < 1 + a -1 
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Proof: To prove (a), we observe that £ Q = / °° ae~ at ^[0, t]dt = E(£[0, T]), where T is 
an exponential random variable with mean a~ 1 independent of £ which is the offspring 
distribution of v%. Thus it is enough to show E([£[0, T]} 2 ) < oo. Note that £[0, T] = 
cr(v\,T) + cb(vi,T), i.e. the number of red and blue vertices born to v\ by the random 
time T. Thus it is enough to show ¥,{c 2 R {v\,T)) and K(c B (vi,T)) < oo. Conditioning on 
T = t and noting by Lemma 5.2 that for fixed t, K(c B (vi,t)) < Ce 2 ^ 1- ^* while for any t, 
conditional on cb{v\,€), cr{v\,£) is stochastically bounded by a Poisson random variable 
with rate tcB(vi,t). Noting that a = 2 — p, we get 

E([£[0,T]] 2 ) < C J°° e-V-ti* (e^-^ + tV^*) dt < oo. 

To prove (b), let S(t) = £ ue BP(t) e' 2aa - . Then YT=i e~ 2a(7v * = S(r m ). Further, since the 
rate of creation of new vertices is |BP(i)| + B(t) (see Proposition 5.3), one has 

E(d5(t)|BP(t)) = e- 2a *(|BP(t)| + B(t))dt. 

Taking expectations and noting that e _ai (|BP(t)| + B(t)) is a martingale gives 



E(s(t)) = i + r 

J o 



This completes the proof. □ 
The next Theorem completes the proof of Theorem 5.5. Recall the limit constant m^(oo) 
in Proposition 5.7. 

Theorem 5.11 (Convergence of characteristics). For any bounded characteristic which 
satisfies the recursive decomposition in (5.5) one has 

e~ a Z^[t) —4 m^(oo)i? 

Taking <f> = 1 and using Proposition 5.3 implies that Roq = W , the a.s. limit of the 
martingale e -a '(|BP(f)| + B(t)). 

Proof: A key role will be played by the martingale \ R n \ . Recall that this was a 

I J ra>0 

martingale with respect to the filtration {J r m} m >o- We shall switch gears and now think 
about the process in continuous time. Define I(t) as the set of individual born after time 
t whose mothers were born before time t and let 

Rt = Yl e ~ aax := %PWh {^h>o : = {^|BP(t)|} t>o • 

x£l(t) 

It is easy to check that Rt is an L 2 bounded martingale with respect to this filtration 
and further Rt — ^> -Roo- For a fixed c > 0, define I(t,c) as the set of vertices born 
after time (t + c) whose mothers are born before time t and let Rt )C = J2xei(t c) e ~ a ° x ■ 
Obviously Rt jC < Rt- Intuitively one should expect Rt iC to be small for large c. The next 
Lemma quantifies this fact. First recall the random variable £ Q = J Q e~ at £(dt) and write 
£a(c) = / c °° e~ at i{dt). Let K(c) = E(£ a (c)). It is easy to check that K{c) — > as c — > oo. 

Proposition 5.12. There exists a constant A such that for all c > 

lim supine < K(c)W 

t— >co 

where W = rimf_ 5 . 00 e _a *|BP(t)| from Proposition 5.3. 
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Proof: Without loss of generality we shall assume t = k is an integer. The proof extends 
easily to general t. A key role is played by a strong law of large numbers, see [3] or [6] for 
a proof. This result was crucially used in [21] to prove convergence in the one type setting. 

Lemma 5.13 (Strong law). Let in, i = 1,2, ... be a sequence of positive integers and let Xij 
for j = 1,2, ... ,rii be a triangular array, independent for each fixed i and constructed on 
the same probability space. Suppose there exists a random variable Y > with E(Y) < oo 
such that \Xij\ is stochastically dominated by Y . Further suppose that 

lim inf — > 0. (5.7) 

i^oo n% + + 7li 



Then 



Y,?=l(Xij-®(Xij)) a.. 



m 

as i —> oo. Further assume the random variables are independent as i varies. The same is 



true of S k = £* =1 ET=i( X v ~ W/E f 



Proof of Proposition 5.12: Fix t = k, where k i an integer. By definition Rk, c is 
made up contributions from all vertices u who are born after time k + c whose mother 
v = v(u) are in BP(fc). Decomposing the sum Rk, c according to the times of birth of this 
mother one has 

R k,c = J2 E e ~ aav / e- as e(ds). 

1=1 v:a v £{i— 

Writing ^(y) = J y e~ at ^ v (dt) where £ v (-) is the offspring distribution point process of v, 
one immediately has 

k 

Rk,c<e- ak Y, E ^( c ) 

i=l v:a v £[i—l,i) 

Each of these random variables are independent across different v and further are all 
stochastically bounded by the random variable £ Q (c). Writing ni = BP(i) — BP(i — 1), 
Prop 5.3 implies that the conditions in Lemma 5.13 are satisfied. Thus one has 

e -" fc BP(fc) ^ =1 E gp^r 1,0 e " (c) ^ wnuc)). 

This completes the proof. □ 
Completing the proof of Theorem 5.11: Recall that we are dealing with bounded 
characteristics, i.e. |^|oo < C for some constant C. Without loss of generality, let C = 1. 
We shall show that there exists a constant k such that for all e > 0, 

limsup |e" a %(t) - m^(oo)| < ke(W + R^). (5.8) 

t— >oo 

Since this is true for any arbitrary e, this completes the proof. Thus fix any e > 0. First 
choose c large such that the function arising in the bound of Proposition 5.12 K(c) < e. 
Next, define (f> s as the truncated characteristic 

*.(-)- :n (5.9) 
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This characteristic is zero for any vertices who have been alive for more that s, so we can 
view it as a characteristic for "young" vertices. The limit constant for this characteristic 
by Proposition 5.7 is 

m^(oo)= f e- au E{(f)(u))du. 
Jo 

Here <p is the original characteristic. If we write eft' = (ft — (f> B , we can view $ as the 
characteristic for "old" vertices. With this notation we have Z^(u) = Z^ g (u) + Zyiu). 

Define fh^ s {u) = e~ au K(Z ( f >s (u)). Now choose choose s large enough such that s > c and 
for all u > s — c one has e~ as < e, |m^ s (oo) — m^(oo)| < e, and \m ( f )a (u) — m^ s (oo)| < e. 
The constructs s and c shall remain fixed for the rest of the argument. 

Let us understand Z^, s (-), which is the branching process counted according to the 
truncated characteristic. We first observe that since (f> s (u) = when u > s, for any t > s, 
vertices born before time t — s (old vertices) do not contribute to Z^ s {t). Thus we can 
write 

Z 4>3 {t)= £ Zl(t-a x )= Zl(t-a x )+ J2 Z %tt-°*) 

x£l(t-s) x£l(t)\I(t-s,c) x£l(t-s,c) 

where Z^{t — cr x ) are the contributions to Z^ 3 (t) by the descendants of a vertex x born 
in the interval [t — s, t] whose mother belongs to BP(i — s). Let M(t, c) = I(t) \ I(t, c), i.e. 
the set of individuals born in the interval [t, t + c] to mothers who were born before time 
t. Then we can decompose the difference as a telescoping sum: 

e- at Z^(t) - m^oo) := E^t) + E 2 (t) + E 3 (t) + E 4 (t) + E 5 (t). (5.10) 

Here: 

(a) Ei(t) is defined as 

E 1 {t) = e- at Z^(t). 

Observe that for Ei(t), the only vertices which contribute are those with age greater than 
s (since (f)'(u) = for u < s). In particular Ei(t) = e~ at Zp (t) < e~ a *|BP(t - s)\. Thus by 
Prop 5.3, one has limsup^^ Exit) < e~ as W < eW by choice of s. 

(b) E 2 {t) is defined as 

E 2 (t) := Yl e " 

xeN(t-s,c) 

For E 2 (t), M(t — s, c) consists of all children of mothers in BP(t — s) born in the interval 
[t — s,t — s + c]. Since each of the individuals in BP(t — s) reproduce at rate at least 1, one 
can check by the strong law of large numbers that liminf^oo \M(t — s, c)|/|BP(t — s)\ > c. 
Further the terms in the summand (conditional on BP(i — s)) are independent random 
variables and each such term in the sum looks like X — E(X), where X is stochastically 
bounded by the random variable Z^ s {c). Similar to the proof of Prop 5.12, using Lemma 
5.13 one can show that limsup^^ |-Ek(£)| — > a.s. We omit the details. 

(c) E 3 (t) is defined as 

E 3 (t) := e ~ aax (™*. (* " °x) ~ ™4>s (oo)) . 

xeAf(t—s,c) 



e- a ^Zl(t-a x )-rh^(t-a x ) 
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By the choice of s since t — a x > s — c, \fh^ s (t — a x ) — m^ s (oo)| < e. Thus one has 
| J^3 (■£) | < eRt- Letting t — >• oo, one gets lim sup^^ | ^3 (i ) | < eRoq. 

(d) Ei(t) is defined as 

E 4 (t) := m 0a (oo) ^ e~ a ^ - R t - S 
\xeN(t-s,c) 

For Ei(t), we have | (j2xeAf(t-s,c) e ~ ae7x ~ R t~sJ \ = Rt-s,c- Thus limsup^^ E±(t) < 
m$ a (oo)K(c)W < rh ( j )s {co)eW by choice of c and using Proposition 5.12 for the asymp- 
totics of Rt, c - 

(e) Finally E 5 (t) := m^ a (oo)(R t - s -Rod). 

Since Rt—s Rooj E^(t) 0. 
Combining all these bounds, one finally arrives at 

lim sup \e" at Z^(t) - m^(oo)| < e(W + m^ s (oo)Roo)- 

t— >OD 

Since e > was arbitrary, this completes the proof. □ 

5.4. Time of first birth asymptotics. For a rooted tree with root p, there is a natural 
notion of a generation of a vertex v, which is the number of edges on the path between v 
and p. Thus p belongs to generation zero, all the neighbors of p belong to generation one, 
and so forth. The aim of this Section is to define a modified notion of generation in BP(t). 
For each fixed k, we shall then define a sequence of stopping times Bir(/c) representing 
the first time an individual in modified generation k is born into the process BP(-). We 
shall study asymptotics of Bir(/c) as k — > 00. In the next Section we shall show how these 
asymptotics result in height asymptotics for the Superstar Model. 

Fix t > 0. For each vertex v G BP(£) let r(v) denote the first red vertex on the path 
between v and the original progenitor of the process BP(-) namely v\. If v is a red vertex 
then r(v) = v. Let d(v) be the number of edges on the path between v and r(v) so that 
d(v) = if v is a red vertex. 

Fix k > 1. Let Bir(/c) denote the stopping times 

Bir(fc) = inf {t > : 3 v G BP(t), d(v) = k} . 

This is equivalent to the first time that there exists a red vertex in BP(t), such that the 
subtree consisting of all blue descendants of this vertex and rooted at this red vertex has 
an individual in generation k. The next theorem proves asymptotics for these times. 

Theorem 5.14. Let W(-) be the Lambert function. We have 

Bir{k) a.s. W(l/e) 
— — > — as I -> 00. 
k 1 — p 

Proof of Theorem 5.14: Given any rooted tree T and v G T, we shall let G(v) 
denote the generation of this vertex in T. Write BP^(-) for the subtree consisting of all 
blue descendants of the original progenitor v\ and rooted at v\. In distribution this is 
just a single type continuous time branching process where each vertex has a Yui_ p (-) — 1 
offspring distribution. Further let 

Bir*(fc) = inf {t : 3 v G BP^(t), G(v) = k} . 
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In words, this is the time of first birth of an individual in generation k for the branching 
process BP^(-). From the definitions of Bir (k), Bir* (k), we have Bir(fe) < Bir*(fc). 

Much is know about the time of first birth of a single type supercritical branching 
process, in particular implies that for BP^(-), there exists a limit constant f3 such that 
Bir* (k)/k ^» (3. Here (3 can be derived as follows. Write \i b for the expected intensity 
measure of the blue offspring distribution, i.e. Hb([0,t}) = E(cfl[«i,t]) = e^-f)* - 1 from 
Lemma 5.4. For 9 > 0, let <j){9) := E(Jq°° e~ et CB(v\,dt)). It is easy to check that this is 
finite only for 9 > 1 — p since 

00 1 — T) 

-St.. nr. xl\J^ 1 P 



e~ m /x 6 ([0,t])dt 



-(1-p) 

For a > define 

A(a) := inf {<p(9)e 0a : 9 > 1 - p} = (1 - p)ae( 1 - p ) a+1 . (5.11) 

Then the limit constant f3 is derived as 

= sup {a > : A(a) < 1} . (5.12) 

From this it follows that /3 = W(l/e)/(l — p) where W(-) is the Lambert function. Then 
we have 

Bir(fc) Bir*(A:) a . s . W(l/e) 
hm sup — : < hm > . 

k^oo k fe->oo k 1 — p 

This gives an upper bound in Theorem 5.14. Lemma 5.15 proves a lower bound and 
completes the proof. 

Lemma 5.15. Fix any e > and let (5 = W(l/e)/(l — p) be the limit constant. Then 

]¥{Bir{l) < (l-e)pl) < oo. 



l=i 

Thus one has liminf^oo Bir(l)/l > f3 a.s. 

Proof. For ease of notation, for the rest of this proof we shall write t e (l) = (1 — e)j3l. In 
the full process BP(-), two processes occur simultaneously: 

(a) New "roots" (red vertices) are created. Recall that we used R(-) for the counting 
process for the number of red roots. 

(b) The blue descendants of each new root have the same distribution as a single type 
continuous time branching process with offspring distribution Yui_ p (-) — 1. 

Fix / > 2 and suppose a new red vertex v was created at some time a v < t e (l). Let 
BP^(-) denote the subtree of blue descendants of v. Let Biv*(v,l) > a v be the time of 
creation of the first blue vertex in generation / for subtree BP^(-). Now Bir(/) < t e (l) 
if and only if there exists a red vertex v born before t e (l) such that the subtree of blue 
descendants of this vertex has a vertex in generation I by this time. For a fixed red vertex 

v G BP(-), write A V {1) for this event. Since Bir*(u,/) — a v = Bir*(/), conditional on BP(cr„) 
one has 

P(A,(Z)|BP((7„)) = P(Bir*(Z) < t £ {l) ~ (Tv) 
Fix < s < (1 — e)(3l. Then for 9 > 1 — p, Markov's inequality implies 

P(Bir*(Z) < (1 - e)pl -s)< e *((i-*)fl-«)E[ e - flBir *(0] 
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One of the main bounds of Kingman ([18], Theorem 1) is E[e" e Bir*(/)] < {(j){9)) 1 . Thus 
we get 

P(Bir*(0 < (1 - e)pi -s)< [4>(9)e e{1 - £) P] l e- es . (5.13) 
By the definition of /3, 

A £ := A(/3(l - e)) := inf {</>(#)e e(1 ~ £)/3 : 9 > 1 - p} < 1. 

It is easy to check that the minimizer occurs at 

e £ = i-p+ 1 

(1 -e)P 

The final probability bound we shall use is 

P(Bir*(0 < (1 - e)pi -s)< [k e ] l e- 6eS . (5.14) 

Let Nf be the number of red vertices born before time t[ (e) whose trees of blue descendants 
BP^(-) have at least one vertex in generation I by time ti{e). Obviously P(Bir*(Z) < 
(1 — e)pi) < F,(N[). Conditioning on the times of birth of red vertices one gets 

E(iVf) < / [A e ] l dE(R(s)) using Eqn. (5.14), 
Jo 

= p[A £ } 1 / e~( 6e ~ q > s ds using Lemma 5.4. 
Jo 

Simplifying, we get for all / > 2, E(JVf ) < C[A e ] 1 for a constant C. Thus 

oo 

J^P(Bir(0 < (l-e)j9i) < oo. 

□ 



6. Equivalence between the branching process and the superstar model 

We start with an informal description of the connection between the Superstar Model 
and the branching process BP(-). We connect vertex vi, which is the initial progenitor of 
BP(-), to the superstar vq (which does not play a role in the evolution of BP(-)) in G<i- All 
the red vertices in the process BP(-) correspond to the neighbors of the superstar vq. The 
true degree of a (non-superstar) vertex in G n +i is one plus the number of its blue children 
in BP(t„), where the additional factor of one comes from the edge connecting this vertex 
to it's ancestor. By elementary properties of the exponential distribution, the dynamics of 
BP(-) imply that each new vertex which is born is red (connected to the superstar vq) with 
probability p, else with probability q = 1 — p is blue and connected to any other vertex 
with probability proportional to it's current degree, increasing the degree of this chosen 
vertex by one. This is nothing but the Superstar Model. 

Formally our surgery will take the random tree BP(r n ) and modify it to get an n + 1- 
vertex tree S n which has the same distribution as the superstar model G n +\. From this 
we will be able to read off the probabilistic properties of the Superstar tree G n . 

As before we label the vertices of BP(r n ) by {vi, V2, • • • , v n } in order of their birth and 
then we add a new vertex v$ to this set to give us the vertex set for G n +i- One can 
anticipate that vq will be our superstar. 
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Figure 6.1. The surgery passing from BP(r n ) to 5„+i and G n+ \ for n = 6. 



Next, we define the edge set for S n . To do this, we take each red vertex v in BP(r n ), 
remove the edge connecting v to its parent (if it has one), and then we create a new edge 
between v and vq- To complete the construction of S n it only remains to ignore the color 
of the vertices. An illustration of this surgery for n = 6 is given in Figure 6.1. 

Proposition 6.1 (Equivalence from surgery operation). The tree S n viewed as a tree with 
vertices without colors has the same distribution as the Superstar Model G n +i . In fact the 
process {S n } n>1 has the same distribution as {G n+ i} n>1 . 

Proof: We shall prove this by induction. Think of S n as being rooted at vq so that 
every vertex except vq in S n has a unique ancestor. The ancestor of all the red individuals 
is the superstar vq while the ancestors of all of the other blue individuals are unchanged 
from BP(r n ). 

The induction hypothesis will be that S n has the same distribution as G n +i and the 
degree of each non-superstar vertex in S n is the number of blue children it possesses plus 
one for the edge connecting the vertex to it's ancestor in S n . Condition on BP(r n ) and fix 
v G BP(r n ). By the property of the exponential distribution, the probability that the next 
vertex born into the system is born to vertex v is given 



A(f,T w ) _ C B {v,T n ) + 1 

EueBP(r n ) K V , T n) E«6BP(r„) C b{v, T n ) + 1 ' 

Thus a new vertex attaches to vertex v with probability proportional to the present degree 
of v in S n . Further, with probability p, this vertex is colored red, whence by the surgery 
operation, the edge to v is deleted and this new vertex is connected to the superstar vo. 
In this case the degree of v in S n is unchanged. With probability 1 — p this new vertex is 
colored blue, whence the surgery operation does not disturb this vertex so that the degree 
of vertex v is increased by one. These are exactly the dynamics of G n +2 conditional on 
G n+ \. By induction the result follows. □ 
For the rest of the proof we shall assume G n +i is constructed through this surgery 
process and suppress S n . 
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7. Proofs of the main results 

Let us now prove all the main results by using the equivalence created by the surgery 
operation and the proven results on BP(-) in Section 5. We record the following fact about 
the asymptotics for the stopping times r n . 

Lemma 7.1 (Stopping time asymptotics). The stopping times r n satisfy 

log n — log W. 



2-p ° 2-p 
Proof: Proposition 5.3 proves that | BP(i) | e -( 2 -^* ^> W. Thus ne-( 2 -pK ^> W. □ 

7.1. Proof of the Superstar strong law. By the surgery operation, the degree of the 
superstar is given by R(r n ), the total number of red vertices. Equation (5.2) shows that 
the number of blue vertices satisfies S(r n )/|BP(r n )| -^A 1—p. Thus R(r n )/\ BP(r n )| p. 
This completes the proof. □ 

7.2. Proof of the degree distribution strong law. Since G n +i is a connected tree, 
every vertex has degree at least one. Recall that cb{v,£) denoted the number of blue 
children of vertex v by time t. Write deg(t>, G n+ i) for the degree of a vertex in G n+ \. The 
surgery operation implies that for any non-superstar vertex 

deg(u, G n+ x) = c B (y,T n ) + 1. (7.1) 

Fixing k > 0, the number of non-superstar vertices with degree exactly k + 1 is the 
same as the number of number of vertices in BP(r n ) which have exactly k blue children. 
Recall that we used Z>/%(t) for the number of vertices in BP(t) which have at least k 
blue children. Proposition 5.3, showed that the total number of vertices |BP(t)| satisfies 
e -(2-j>)t|BP(i)| ^ W*/{2-p). Theorem 5.5 showed that 

; 2 /A • w* 



^ z> _ k it)^k\j{^ + y^ 



p 



Thus writing p>k(t) = Z> k (t)/BP(t) for the proportion of vertices with degree k, Theorem 
5.5 implies one has 



P>k(t) ^ k\ TT ( i + - — -J := p>k (oo) 



as t — > oo. Now let k > 1. Writing N> k (n) for the number of non superstar vertices 
with degree at least k in G n+ i, one has N>k(n)/n — p>k-i(oo) as n — > oo. Thus the 
proportion of vertices with degree exactly k converges to p>fc_i(oo) — p>fe(oo) = vsM(k). 
This completes the proof. □ 

7.3. Proof of maximal degree asymptotics. The aim of this is to prove Theorem 2.4. 
We wish to analyze the maximal non-superstar degree which we wrote as 

T n = max {deg(vi, G n+ i) : 1 < i < n} . 

The plan will be as follows: we will first prove the simpler assertion of convergence of the 
degree of vertex v k for fixed k > 1. Then we shall show that given any e > 0, we can 
choose K such that for large n, the maximal degree vertex has to be one of the first K 
vertices vi, V2, vk with probability greater than 1 — e. This completes the proof. 
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Fix k > 1. Recall from (7.1) that deg(vk, G n+ i) = csivk, r n ) + 1 where CB(vk, t) are the 
number of blue vertices born to vertex k by time t. Recall that CB{vk,t) is a Yule process 
of rate 1—p started at time (i.e. at the birth of vertex v^). By Lemma 5.2, 

where is an exponential random variable with mean one. By Proposition 5.3, | BP(i)|/ e( 2 ~ p )* 
W. Write 7 = (1 -p)/(2-p) and let A fc = e'^-P^WW^ . Then we have 

n^deg(v k G ) - c bK^-i) + 1 ( e^- 1 V e -(i-^ 
deg^,G„)- e(1 _ p)(Tn _ 1 _ Tfc) ^BP^.^I + lJ 6 

= A fc . 

□ 

Now let us prove the convergence of the maximal non-superstar degree T n . Fix L > 
and let 

M n [0,L] := max{deg(t; fc ,G n+1 ) : r k < L} . (7.3) 

In words, this is the largest degree in G n+ i amongst all vertices born before time L in 
BP(-). The convergence of the degree of vt for any k > 1 implies the next result. 

Lemma 7.2 (Convergence near the root). Fix any L > 0. Then there exists a random 
variable A*[0, L] > such that 

Now if we can show that with high probability, T n = M n [0, L] for large finite L as 
n — ¥ 00, then we are done. This is accomplished via the next Lemma. First we shall need 
to setup some notation. Recall that by asymptotics for the stopping times r n in Lemma 
7.1, given any e > 0, we can choose K £ > such that 



lim sup I 



1 , 

logra 

1—p 



>K E )<8. (7.4) 



For any < L < t, let BP(L,£] denote the set of vertices born in the interval (L,i\. 
Recall that we used v% for the original progenitor. For any time t and v £ BP(i), let 
deg v (t) = CB(v,t) + 1 denote the degree of vertex v in the superstar model C|BP(t)|+i 
obtained through the surgery procedure. For fixed K and L, let A n (K,L) denote the 
event that for some time t € [(2 — p)^ 1 logra ± K], there exists a vertex v in BP(L, t] with 
deg v (t) > deg vi (t). 

Lemma 7.3 (Maxima occurs near the root). Given any K and e, one can choose L > 
such that 

limsupP(^ n ( J FC,L)) < e. 

In particular, given any e > 0, we can choose L such that 

limsu P P(T n ^M n ([0,L])) < e. 
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Using Lemma 7.2 now shows that there exists a random variable A* such that T„/n 7 
A*, and this completes the proof of Theorem 2.4. 

Proof of Lemma 7.3: For ease of notation, write t~ = (2 — p) _1 logn — K and 
t£ = (2 — p)~ 1 logn + K. Since the degree of any vertex is an increasing process it is 
enough to show that we can choose L = L(K, e) such that as n — > oo, the probability that 
there is some vertex born in the time interval [L, whose degree at time i+ is larger than 
the degree of the root v\ at time t~ is smaller than e. Let M^ L t +i(i^) denote the maximal 
degree by time of all vertices born in the interval [L, t+]. Then for any constant C 

F(A n (K,L)) < F [{deg Vl (t-) < Cn 7 } n {M [Ljtt] (t+) > Cn 7 }) 

< F {deg Vl (t-) < Cn 7 ) + P (M [L tt] (tt) > Cn 7 ) . 

Since the offspring distribution of v\ is a rate (1 — p) Yule process 

e -(l- P )*n deg = e (l- P )K/2 ^g vl (t-) 

where W V1 has an exponential distribution. Thus for a fixed K, we can choose C = C(e) 
large enough such that 

limsupP (deg^t") < Cn 7 ) < e/2. 

71— >OQ 

Thus for a fixed e,C,K, it is enough to choose L large such that 

limsupP (M„ +] (t+) > CnA < e/2. 

Without loss of generality, we shall assume L £ ,t+ are all integers. For any integer L £ < 
m < i+ — 1, let -^[ m)m +i](i r t) denote the maximum degree by time i+ of all vertices born 
in the interval [m, m + 1]. Then 

M [Lt+l<Xt)= maJ l M [m,m+1](4)- 
1 ' 1 L<m<t+-l 

Let |BP[m,m+l]| denote the number of vertices born in the time interval [m,m + l]. Since 
for a vertex born at some time s < i+ , the degree of the vertex at time t£ has distribution 
Yui_ p (t+ — s), an application of the union bound yields 

P (Af [Ltt +](t+) > Cn 7 ) < ^ E(|BP[m,m + l]|)P(Yu,(t+ - m) > Cn 7 ). 

m=L 

NowE(BP[m,m + l]) < E(|BP(m + 1)|). By Proposition 5.3, E(|BP(t)|) < e (2 ~ p) *. Further 
by Lemma 5.2, for fixed time s, a rate 1—p Yule process has a geometric distribution with 
parameter e~ i - l ~ p ^ >s . Thus we have 



r 1- — i 

2—p)m 



e -(l-p)(t+-m) 



V (M [Ltt] {tt) > Crf) < A C ( 

771 = L 
4-1 

< ^ e ((2-p)m-C*e( 1 -P)( m - A ')) 



m=L 
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where last inequality follows from the fact that for < 2; < 1, 1 — x < e~ x and e*"/ 2 = 
n t e^-~ p ' K . Now choosing L large, one can make the right hand side of the last inequality 
as small as one desires and this completes the proof. 

□ 

7.4. Proof of logarithmic height scaling. The aim of this section is to complete the 
proof of Theorem 2.5. Let us first understand the relationship between the distances in 
BP(r n ) and G n+ \ due to the surgery operation. The distance of all the red vertices in 
BP(r n ) from the superstar vq is one. For each blue vertex v E BP(r n ), let r(v) denote 
the first red vertex on the path from v to the root v± in BP(r n ). Recall from Section 5.4 
that d(v) denoted the number of edges on the path between v and r(v) with d(v) = if v 
was a red vertex. Then the distance of this vertex from the superstar v$ in G n +i is just 
d(v) + 1 since the vertex needs d(v) steps to get to r(v) which is then directly connected 
to vo in G n+ \ by an edge. Let D(u, v) denote the graph distance between vertices u and 
v in G n +\. Since by convention d(v) = for all the red vertices, this argument shows that 
for all v ^ vo £ G n+ \, D(v,vq) = d(v) + 1. In particular the height of G n+ \ is given by 

U{G n+l ) = max{d(v) + 1 : v e BP(r n )} . (7.5) 

Now by the definition of 7i(G n +i), there is a vertex in BP(T n ) such that d{v) = H(G n +i) — l 
but no vertex with d(v) = T-i{G n+ \). Recall the stopping times Bir(fc), defined as the first 
time a vertex with d(v) = k is born in BP(-). Thus we have 

Bir(^(G n+ i) - 1) < r n < Bir{H(G n+1 )). (7.6) 

Now recall that Theorem 5.14 showed that the stopping times Bir(A:) satisfy Bir(fc)//c 
W(l/e)/l — p as k — > oo. Dividing (7.6) throughout by %{G n j r \) we have 

Bir(^(G w+ i) - 1) a . S) W(l/e) r n a . S) 1 

U{G n+ i) l-p ' logn 2-p' 

Here the first assertion follows by Theorem 5.14 while the second assertion follows from 
Lemma 7.1 which described asymptotics for the stopping times r n . Rearranging shows 
that 

-H(G n+1 ) a.s. (i-p) 
logn y W(l/e)(2-p)' 
This completes the proof. □ 
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Appendix 

Below we describe each of the thirteen events and show the corresponding event specific 
term. 
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E = 1: Brazil vs Netherlands soccer match from the 2010 World Cup. The term 
is "Brazil" or "Netherlands". 

E = 2: Basketball player Lebron James announcement of signing with the Miami 
Heat. The term is "Lebron". 

E = 3: The 2010 World Cup Kick-Off Celebration Concert. The term is "World 
Cup". 

E = 4: Brazil vs Portugal soccer match from the 2010 World Cup.. The term is 
"Brazil" or " Portugal". 

E = 5: Italy vs Slovakia soccer match from the 2010 World Cup. The term is 
"Italy" or "Slovakia". 

E = 6: The 2010 BET Awards show. The term is "BET Awards". 

E = 7: The firing of General Stanly McChrystal by US President Barack Obama. 

The term is "McChrystal" . 

E = 8: The 2010 World Cup Opening Ceremony. The term is "World Cup". 

E = 9: Mexico vs South Africa soccer match from the 2010 World Cup. The term 

is "Mexico". 

E = 10: England vs Slovakia soccer match from the 2010 World Cup. The term is 
"England". 

E = 11: Portugal vs North Korea soccer match from the 2010 World Cup. The 
term is "Portugal" . 

E = 12: Roger Federer's tennis match in the first round of the 2010 Wimbledon 
tournament. The term is "Federer". 

E = 13: The UN imposing sanctions on Iran. The term is "Iran". 



